Microarray probe Tm matching by selective destabilization

ABSTRACT

Methods and systems for designing oligonucleotide probes for use in microarray applications are provided herein. The described methods use duplex melting temperature (Tm) matching to destabilize the hybridization oligonucleotide probes to non-target sequences as compared to a target nucleotide sequence. Nucleic acid arrays containing probes selected by the described methods are also described.

CROSS REFERENCE TO RELATED APPLICATIONS

This Application is a continuation-in-part of, and claims priority to,U.S. patent application Ser. No. 10/996,323, filed Nov. 23, 2004, andnow published as US 2006/0110744.

BACKGROUND

Comparative genomic hybridization (CGH) and location analysis areimportant applications, which allow scientists to improve theirunderstanding of the expression and regulation of genes in biologicalsystems. Both CGH and location analysis entail quantifying or measuringchanges in copy number of genomic sequences. CGH is particularlyimportant in developmental biology as well as the causes of cancer andoffers great potential in the diagnostics of cancer and developmentaldiseases. Recently, cDNA microarrays have been used for CGH studies. Anoligo-array based approach has several substantial advantages over othertechnologies, in that it allows the designer to position the probesanywhere within the genomic or polynucleotide sequence of interest. Theprobes can be placed at whatever density is commensurate with thereal-estate or area available on the microarray (in terms of number offeatures) and the genomic regions of interest can be evaluated byanalyzing the hybridization of target sequences to the surface-boundprobes. The oligonucleotide probe approach also offers the flexibilityof focusing in on regions within exons or introns of expressedsequences, or intergenic regions and regulatory regions for locationanalysis, as well as any desirable admixture of the aforementioned.

The oligonucleotide probe-based approach requires hybridizing manythousands of probe-target hybrids under uniform conditions, and thisrequirement is a known source of error in microarray measurements. Thediscrimination between a target sequence of interest and competingsequences in a sample is greatest when the hybridization conditions aresuch that the hybrid formed between the probe and the desired target isstable, while hybrids between the probe and competing sequences aremelted off. Probes are designed to maximize the differential stabilitybetween the hybrids with the desired targets and with competingsequences. The assay conditions must be chosen such that the meltingpoint (T_(m)) of the desired hybrid is above the temperature of theassay, and the T_(m) for competing hybrids is below assay temperature.This is difficult to achieve for thousands of probe-target hybrid pairs,but can be addressed through various methods of T_(m)-matching.

For CGH arrays, where appropriate T_(m)-matched probes cannot always befound, it is usual to destabilize the more tightly bound hybrids (highT_(m) probes) so as to reduce their Tm to equal those of less tightlybound hybrids (low T_(m) probes). Typical methods include introducingarbitrarily selected mismatches or deletions into the probe sequence, orshortening the length of the probes. Such methods are effective forshort probes (12-24 mers), but less effective for the 60-mer or longerprobes used in microarray applications. The probe sequence shortening,or the mismatches in the sequence, may not correspond or coincide withthe subsequence in the region of interest that hybridizes with thecompeting sequence. As a result, the modification of the probe (viatruncation, deletion or substitution) destabilizes the target hybridmore than the competing hybrid, and fails to accomplish its purpose.

SUMMARY

This disclosure is directed to methods for designing and/or modifying atarget-specific oligonucleotide probe. The methods as described hereinprovide for modifications of target-specific probes by substituting ordeleting at least one nucleotide in a region of the probe that iscomplementary to a region of the target sequence that has the mosthomology with a non-target sequence. In some embodiments, themodification of the target specific oligonucleotide probe decreases thecomputed T_(m) of the hybridization of the probe to the target sequence.In some embodiments, the computed T_(m) of the target-specificoligonucleotide probe to at least one non-target sequence is alsoreduced. In some embodiments, the decrease in computed T_(m) of themodified probe may decrease the stability of non-target sequences thatcompete with the target sequence for binding to the oligonucleotideprobe.

In some embodiments, the methods comprise identifying a target-specificoligonucleotide probe comprising a sequence complementary to a targetnucleotide sequence and that has a computed T_(m) of about 65° C. orgreater and modifying the target-specific oligonucleotide probe todecrease the T_(m) so that the modified target-specific probe hybridizesto at least one non-target sequence with a T_(m) the same or lower thanthe computed T_(m) of the hybridization of the modified target-specificnucleotide probe to the target nucleotide sequence.

In aspects, the present description provides methods for modifying ordesigning target-specific oligonucleotide probes for microarrayapplications. Candidate probes with sequences complementary to a targetregion of interest are identified. Using a computerized search engine,the sequence of the entire genome is searched to find all sequences thatcan form stable hybrids with the candidate probes (i.e. sequences withhomology to the candidate probes). The most homologous sequences areselected, and the candidate probes are modified by deletion orsubstitution of one or more nucleotides in the candidate probe sequence.The deletion or substitution destabilizes the hybrid pair formed betweenthe candidate probe and the undesired sequences by reducing the T_(m)for the hybrid pairs, below the computed T_(m) of the hybrid between theprobe and the desired target sequence. In some embodiments, candidateprobes are selected such that (a) the hybrid between the destabilizedprobe and the desired target is not melted at the chosen assaytemperature, and (b) the hybrids between the probe and all undesiredhomologous targets are melted at the chosen assay temperature, and (c)the melting temperatures of the desired and undesired hybrids are asdifferent as possible.

Algorithms for performing the described methods recorded on acomputer-readable medium, as well as computational analysis systems thatinclude the same are provided. Also provided are nucleic acid arrayswith oligonucleotide probes selected according to the subject methods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart generally depicting the methods described herein.

DETAILED DESCRIPTION

Various embodiments of the present invention will be described in detailwith reference to the drawings, wherein like reference numeralsrepresent like parts throughout the several views. Reference to variousembodiments does not limit the scope of the invention, which is limitedonly by the scope of the claims attached hereto. Additionally, anyexamples set forth in this specification are not intended to be limitingand merely set forth some of the many possible embodiments for theclaimed invention.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art. Although any methods, devices and material similar orequivalent to those described herein can be used in practice or testing,the methods, devices and materials are now described.

All publications and patent applications in this specification areindicative of the level of ordinary skill in the art and areincorporated herein by reference in their entireties.

In this specification and the appended claims, the singular forms “a,”“an,” and “the” include plural reference, unless the context clearlydictates otherwise. Unless defined otherwise, all technical andscientific terms used herein have the same meaning as commonlyunderstood to one of ordinary skill in the art.

Definitions

The terms “nucleic acid” and “polynucleotide” are used interchangeablyherein to describe a polymer of any length, e.g., greater than about 10bases, greater than about 100 bases, greater than about 500 bases,greater than 1000 bases, usually up to about 10,000 or more basescomposed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides,or compounds produced synthetically (e.g., PNA as described in U.S. Pat.No. 5,948,902 and the references cited therein) which can hybridize withnaturally occurring nucleic acids in a sequence specific manneranalogous to that of two naturally occurring nucleic acids, e.g., canparticipate in Watson-Crick base pairing interactions. The term “hybrid”refers to a double-stranded nucleic acid molecule formed byhybridization between complementary nucleotides. The terms “hybrid” and“hybrid pair” are used interchangeably herein.

The term “complementary,” “complement,” or “complementary nucleic acidsequence” refers to the nucleic acid strand that is related to the basesequence in another nucleic acid strand by the Watson-Crick base-pairingrules. In general, two sequences are complementary when the sequence ofone can bind to the sequence of the other in an anti-parallel sensewherein the 3′-end of each sequence binds to the 5′-end of the othersequence and each A, T(U), G, and C of one sequence is then aligned witha T(U), A, C, and G respectively, of the other sequence. RNA sequencescan also include complementary G/U or U/G basepairs.

The terms “ribonucleic acid” and “RNA” as used herein mean a polymercomposed of ribonucleotides.

The terms “deoxyribonucleic acid” and “DNA” as used herein mean apolymer composed of deoxyribonucleotides.

The term “oligonucleotide” as used herein denotes single strandednucleotide multimers of from about 10 to 100 nucleotides and up to 200nucleotides in length. Oligonucleotides are usually synthetic and, inmany embodiments, are under 50 nucleotides in length.

The term “oligomer” is used herein to indicate a chemical entity thatcontains a plurality of nucleotide monomers, i.e., a nucleotidemultimer. As used herein, the terms “oligomer” and “polymer” are usedinterchangeably, as it is generally, although not necessarily, smaller“polymers” that are prepared using the functionalized substrates of theinvention, particularly in conjunction with combinatorial chemistrytechniques. Examples of oligomers and polymers includepolydeoxyribonucleotides (DNA), polyribonucleotides (RNA), other nucleicacids that are C-glycosides of a purine or pyrimidine base, polypeptides(proteins), polysaccharides (starches, or polysugars), and otherchemical entities that contain repeating units of like chemicalstructure.

The term “sample” as used herein relates to a material or mixture ofmaterials, typically, although not necessarily, in fluid form,containing one or more components of interest. Samples include, but arenot limited to, biological samples obtained from natural biologicalsources, such as cells or tissue. The samples may also be derived fromtissue biopsies and other clinical procedures.

The terms “nucleotide” and “nucleotide” are intended to include thosemoieties that contain not only the known purine and pyrimidine bases,but also other heterocyclic bases that have been modified. Suchmodifications include methylated purines or pyrimidines, acylatedpurines or pyrimidines, alkylated riboses or other heterocycles. Inaddition, the terms “nucleotide” and “nucleotide” include those moietiesthat contain not only conventional ribose and deoxyribose sugars, butother sugars as well. Modified nucleosides or nucleotides also includemodifications on the sugar moiety, e.g., wherein one or more of thehydroxyl groups are replaced with halogen atoms or aliphatic groups, orare functionalized as ethers, amines, or the like.

The phrase “surface-bound polynucleotide” refers to a polynucleotidethat is immobilized on a surface of a solid substrate, where thesubstrate can have a variety of configurations, e.g., a sheet, bead, orother structure. In certain embodiments, the collections ofoligonucleotide probe elements employed herein are present on a surfaceof the same planar support, e.g., in the form of an array.

The phrase “labeled population of nucleic acids” refers to a mixture ofnucleic acids that are detectably labeled, e.g., fluorescently labeled,such that the presence of the nucleic acids can be detected by assessingthe presence of the label. A labeled population of nucleic acids isoften “made from” a biological DNA sample.

The term “array” encompasses the term “microarray” and refers to anordered array presented for binding to nucleic acids and the like. An“array,” includes any two-dimensional or substantially two-dimensional(as well as a three-dimensional) arrangement of spatially addressableregions bearing nucleic acids, particularly oligonucleotides orsynthetic mimetics thereof, and the like. Where the arrays are arrays ofnucleic acids, the nucleic acids may be adsorbed, physisorbed,chemisorbed, or covalently attached to the arrays at any point or pointsalong the nucleic acid chain.

In those embodiments where an array includes two or more featuresimmobilized on the same surface of a solid support, the array may bereferred to as addressable. An array is “addressable” when it hasmultiple regions of different moieties (e.g., different oligonucleotidesequences) such that a region (i.e., a “feature” or “spot” of the array)at a particular predetermined location (i.e., an “address”) on the arraywill detect a particular sequence. Array features are typically, butneed not be, separated by intervening spaces. In the case of an array inthe context of the present application, the “population of labelednucleic acids” will be referenced as a moiety in a mobile phase(typically fluid), to be detected by “surface-bound polynucleotides”which are bound to the substrate at the various regions. These phrasesare synonymous with the arbitrary terms “target” and “probe”, or “probe”and “target”, respectively, as they are used in other publications.

A “scan region” refers to a contiguous (preferably, rectangular) area inwhich the array spots or features of interest, as defined above, arefound or detected. Where fluorescent labels are employed, the scanregion is that portion of the total area illuminated from which theresulting fluorescence is detected and recorded. Where other detectionprotocols are employed, the scan region is that portion of the totalarea queried from which resulting signal is detected and recorded. Forthe purposes of this invention and with respect to fluorescent detectionembodiments, the scan region includes the entire area of the slidescanned in each pass of the lens, between the first feature of interest,and the last feature of interest, even if there are intervening areasthat lack features of interest.

The term “substrate” as used herein refers to a surface upon whichmarker molecules or probes, e.g., an array, may be adhered. Glass slidesare the most common substrate for biochips, although fused silica,silicon, plastic, flexible web and other materials are also suitable.

An “array layout” refers to one or more characteristics of the features,such as feature positioning on the substrate, one or more featuredimensions, and an indication of a moiety at a given location.“Hybridizing” and “binding”, with respect to nucleic acids, are usedinterchangeably. The terms “hybridizing,”“hybridizing specifically to,”and “specific hybridization” as used herein, refer to the binding,duplexing, or hybridizing of a nucleic acid molecule preferentially to aparticular nucleotide sequence under stringent conditions.

The term “stringent assay conditions” as used herein refers toconditions that are compatible to produce binding pairs of nucleicacids, e.g., probes and targets, of sufficient complementary to providefor the desired level of specificity in the assay while beingincompatible to the formation of binding pairs between binding membersof insufficient complementary to provide for the desired specificity.The term stringent assay conditions refers to the combination ofhybridization and wash conditions.

A “stringent hybridization” and “stringent hybridization washconditions” in the context of nucleic acid hybridization (e.g., as inarray, Southern or Northern hybridizations) are sequence dependent, andare different under different environmental parameters. Stringenthybridization conditions that can be used to identify nucleic acidswithin the scope of the invention can include, e.g., hybridization in abuffer comprising 50% formamide, 5×SSC, and 1% SDS at 42° C., orhybridization in a buffer comprising 5×SSC and 1% SDS at 65° C., bothwith a wash of 0.2×SSC and 0.1% SDS at 65° C. Exemplary stringenthybridization conditions can also include a hybridization in a buffer of40% formamide, 1 M NaCl, and 1% SDS at 37° C., and a wash in 0.1×SSC at45° C. Alternatively, hybridization to filter-bound DNA in 0.5 M NaHPO₄,7% sodium dodecyl sulfate (SDS), 1 mnM EDTA at 65° C., and washing in0.1×SSC/0.1% SDS at 68° C. can be employed. Yet additional stringenthybridization conditions include hybridization at 60° C. or higher and3×SSC (450 mM sodium chloride/45 mM sodium citrate) or incubation at 42°C. in a solution containing 30% formamide, 1M NaCl, 0.5% sodiumsarcosine, 50 mM MES, pH 6.5. Those of ordinary skill will readilyrecognize that alternative but comparable hybridization and washconditions can be utilized to provide conditions of similar stringency.

In certain embodiments, the stringency of the wash conditions determinewhether a nucleic acid is specifically hybridized to a probe. Washconditions used to identify nucleic acids may include, e.g.: a saltconcentration of about 0.02 M at pH 7 and a temperature of about 20° C.to about 40° C.; or, a salt concentration of about 0.15 M NaCl at 72° C.for about 15 minutes; or, a salt concentration of about 0.2×SSC at atemperature of about 30° C. to about 50° C. for about 2 to about 20minutes; or, the hybridization complex is washed twice with a solutionwith a salt concentration of about 2×SSC containing 1% SDS at roomtemperature for 15 minutes and then washed twice by 0.1×SSC containing0.1% SDS at 37° C. for 15 minutes; or, equivalent conditions. Stringentconditions for washing can also be, e.g., 0.2×SSC/0.1% SDS at 42° C. SeeSambrook, Ausubel, or Tijssen (cited below) for detailed descriptions ofequivalent hybridization and wash conditions and for reagents andbuffers, e.g., SSC buffers and equivalent reagents and conditions.

A specific example of stringent assay conditions is rotatinghybridization at 65° C. in a salt based hybridization buffer with atotal monovalent cation concentration of 1.5M (e.g., as described inU.S. patent application Ser. No. 09/655,482 filed on Sep. 5, 2000, thedisclosure of which is herein incorporated by reference) followed bywashes of 0.5×SSC and 0.1×SSC at room temperature.

Stringent hybridization conditions may also include a “prehybridization”of aqueous phase nucleic acids with complexity-reducing nucleic acids tosuppress repetitive sequences. For example, certain stringenthybridization conditions include, prior to any hybridization tosurface-bound polynucleotides, hybridization with Cot-1 DNA, or thelike.

Stringent assay conditions are hybridization conditions that are atleast as stringent as the above representative conditions, where a givenset of conditions are considered to be at least as stringent ifsubstantially no additional binding complexes that lack sufficientcomplementary to provide for the desired specificity are produced in thegiven set of conditions as compared to the above specific conditions,whereby “substantially no more” is meant less than about 5-fold more,typically less than about 3-fold more. Other stringent hybridizationconditions are known in the art and may also be employed, asappropriate.

The term “mixture”, as used herein, refers to a combination of elements,that are interspersed and not in any particular order. A mixture isheterogeneous and not spatially separable into its differentconstituents. Examples of mixtures of elements include a number ofdifferent elements that are dissolved in the same aqueous solution, or anumber of different elements attached to a solid support at random or inno particular order in which the different elements are not especiallydistinct. In other words, a mixture is not addressable. To be specific,an array of surface-bound polynucleotides, as is commonly known in theart and described below, is not a mixture of capture agents because thespecies of surface-bound polynucleotides are spatially distinct and thearray is addressable.

“Isolated” or “purified” generally refers to isolation of a substance(compound, polynucleotide, protein, polypeptide, polypeptide,chromosome, etc.) such that the substance comprises the majority percentof the sample in which it resides. Typically in a sample a substantiallypurified component comprises 50%, preferably 80%-85%, more preferably90-95% of the sample. Techniques for purifying polynucleotides,polypeptides and intact chromosomes of interest are well-known in theart and include, for example, ion-exchange chromatography, affinitychromatography, sorting, and sedimentation according to density.

The terms “assessing” and “evaluating” are used interchangeably to referto any form of measurement, and include determining if an element ispresent or not. The terms “determining,” “measuring,” and “assessing,”and “assaying” are used interchangeably and include both quantitativeand qualitative determinations. Assessing may be relative or absolute.“Assessing the presence of” includes determining the amount of somethingpresent, as well as determining whether it is present or absent.

The term “using” has its conventional meaning, and, as such, meansemploying, e.g., putting into service, a method or composition to attainan end. For example, if a program is used to create a file, a program isexecuted to make a file, the file usually being the output of theprogram. In another example, if a computer file is used, it is usuallyaccessed, read, and the information stored in the file employed toattain an end. Similarly if a unique identifier, e.g., a barcode isused, the unique identifier is usually read to identify, for example, anobject or file associated with the unique identifier.

A “target-specific oligonucleotide probe” means a polynucleotide whichcan specifically hybridize to a target nucleotide, either in solution oras a surface-bound polynucleotide. A “non-target sequence” is a sequencesufficiently related (i.e. complementary) to the target nucleic acidsequence, such that the non-target nucleotide sequence probe interfereswith the hybridization of the target-specific oligonucleotide probe withthe target nucleotide sequence in the region of interest.

The term “T_(m)” refers to the melting temperature of twooligonucleotides which have formed a duplex structure (i.e. eitherhybrids of perfectly matched sequences, or hybrids of sequencescontaining mismatches or deletions). Computed T_(m) can be calculatedusing available software and methods known to the art. In someembodiments, the duplex T_(m) is measured empirically, by measuring thedegree of hybridization at various temperatures.

Approaches and Methods for Target-Specific Probe Selection

The present methods provide alternative and novel methods and systemsfor designing or modifying target-specific probes for analysis such asCGH and location analysis in microarray applications that overcome thedrawbacks of existing microarray probe selection techniques. Generalmethods that utilize probe/target hybridization experiments and/orunique data analysis techniques to identify and select nucleotideprobe(s) targeting polynucleotide fragments in a region of interest aredescribed in U.S. Patent Publication No. 2006/0110744. The methodsdescribed herein provide for a more efficient design or modification oftarget-specific probes, thereby reducing to a minimum the number of suchprobes that are utilized in analyzing a target sequence within a regionof interest.

The present description provides methods, systems and computer readablemedia for identifying and modifying nucleic acid probes for detecting atarget with a nucleic acid probe array or microarray. In someembodiments, the methods comprise, in general terms: the selection ofgenomic nucleotide ranges of interest, determining appropriate targetsequences for CGH and/or location analysis, generating candidate probesspecific for the target sequences and analyzing candidate probes forspecific probe properties by computational and/or experimental processesto optimize probe selection and reduce the number of probes to a valueappropriate for placement on a microarray. The description also providesmicroarrays comprising probes designed or modified by the methodsdescribed herein. The microarrays comprise a solid support and aplurality of surface bound probes, the surface bound probes having verysimilar thermodynamic properties as well as similar GC content. In someembodiments, more specifically, a large portion of the probes utilizedin the microarrays of the invention, have duplex melting temperatures(T_(m)) which are within a narrow temperature range compared to theT_(m) range of possible genomic probes, such as, for example, genomicprobes selected at random for a particular application.

The methods provided herein are particularly useful with comparativegenome hybridization microarrays, such as microarrays based on the humanor mouse genome. These methods permit more cost-effective and efficientidentification of gene regions or sections which can be associated withhuman disease, points of therapeutic intervention, and potential toxicside-effects of proposed therapeutic entities.

The methods as described herein are methods for designing or modifying atarget-specific oligonucleotide probe to increase the specifichybridization of the probe to the target sequence by choosing ordesigning probes that discriminate between the target nucleotidesequences and competing non-target sequences. In some embodiments, amethod comprises designing or modifying a target-specificoligonucleotide probe comprising identifying a target specificoligonucleotide probe comprising a sequence complementary to a targetnucleotide sequence of interest. In some embodiments, the targetspecific oligonucleotide and the target nucleotide sequence pair has acomputed T_(m) of at least 65° C. or greater. A method further comprisesmodifying the sequence of the identified target-specific oligonucleotideprobe to decrease the computed T_(m) so that the modifiedtarget-specific oligonucleotide probe hybridizes to at least onenon-target sequence with a computed T_(m) lower than the computed T_(m)of the hybridization of the modified target specific oligonucleotideprobe to the target nucleotide sequence.

Identifying Target-Specific Oligonucleotide Probes

Target-specific oligonucleotide probes can be designed or selected byfirst identifying a target nucleotide sequence of interest. Targetnucleotide sequences of interest include genomic nucleic acid sequences,RNA sequences from a cell, a particular gene, one or more regions of achromosome, and the like.

Candidate target-specific probe sequences can be identified from aplurality of candidate probe sequences by searching sequences with ahigh homology to the target sequence of interest and identifying probesthat can hybridize to the target sequence as well as one or morehomologous non-target sequences. In some embodiments, a candidatetarget-specific oligonucleotide probe hybridizes or is complementary tothe target sequence and at least two non-target sequences. In someembodiments, candidate probes that are too closely homologous to thenon-target sequences, for example, greater than 50% sequence identity toone or more non-target sequences are excluded. If a candidatetarget-specific probe sequence has a higher than desired computed T_(m)or higher homology than desired to non-target sequences, the probe canbe designed or modified using the methods described herein.

As indicated in operation 102 in FIG. 1, a computerized algorithm can beused to search all sequences homologous to the candidate probe. In anaspect, homology algorithms can be used to interrogate known genedatabases for naturally occurring sequences that are closest to theoriginal (or candidate) probe sequence. Known homology algorithms orsearch engines that can be used with the present methods include BLAST(from NCBI, see F. Altschul et al., J. Mol. Biol. 215:403-10 (1990)),MegaBLAST (a variation on the BLAST search engine), BLAT, etc. A numberof other homology-based algorithms are also known, such asthermodynamically-scored homology programs, for example, as described inU.S. Pat. No. 5,556,749.

In embodiments, the methods described herein use a homology searchengine or algorithm or database that returns sequences with the lowestnumber of mismatched nucleotides against the candidate probe beingsubjected to T_(m)-matching. In an aspect, priority is given to thosesequences within reasonably homologous regions or strains of the same orsimilar genomes. The search may also compare sequences based on specificfactors, including thermodynamic factors, such as free energy. In anaspect, the search comprises homologous sequences with a computed T_(m)substantially the same as the predetermined T_(m) for the hybrid pairformed between the candidate probe and the target sequence.

Within the complex sample mixture, there may be nucleotides ornucleotide sequences that are most likely to form hybrids with thecandidate probes (i.e. most likely to interfere or compete withprobe-target hybridization), and thereby cause problems in probe design.In an aspect, using a homology search engine to identify sequences withhomology to the candidate probe is advantageous, because all potentialsequences within sequenced genomes can be identified. This allows asearcher to take into account all sequences that would potentially occurin an actual biological sample.

Various homology scoring mechanisms can be used to evaluate whether aparticular competitor sequence is sufficiently homologous to thecandidate probe. These include, without limitation, symbolic match score(score is based on the number of identical bases in aposition-by-position comparison between the sequence of interest and aputative homologous sequence), ungapped BLAST score, gapped blastscores, thermodynamic scores, and score thresholds, for example. Inaspects, duplex melting temperature or T_(m) is used to determinehomology. In embodiments, sequences are selected based on the homologyscore. In an aspect, where symbolic match scores, or BLAST scores areused, the sequences with the highest score are selected. Where T_(m) isused, sequences that show T_(m) differences of about less than 30° C.are selected for further analysis.

In some embodiments, candidate target-specific oligonucleotide probesare identified from a plurality of candidate probe sequences bysearching for sequence homology to the target sequence and then furtherfiltered using one or more criteria. For comparative genomichybridization and location analysis, the candidate target-specificprobes are identified and filtered as described in U.S. Publication No.2006/0110744 which is hereby incorporated by reference.

In some embodiments, the candidate oligonucleotide probes have anucleotide length in the range of at least 25 to about 200 nucleotides.In some embodiments, the length of probes range from 30 to 100nucleotides. In some embodiments, at least 50% of the nucleotide probeson the solid support have the same length and the length may be about 60nucleotides. In some embodiments, candidate probes may have asubsequence less that the full length sequence that has a greater degreeof homology to the target sequence and/or one or more target sequences.The subsequences can range from about 15 to 190 nucleotides, about 25 to150 nucleotides, or about 25-55 nucleotides.

In some embodiments, a candidate target-specific oligonucleotide probehas a computed T_(m) of 65° C. or greater to the target nucleotidesequence. In some embodiments, a candidate probe may have a higher thandesired T_(m) due to a subsequence that has a higher percentage of GCcontent, for example, 40% or greater GC content over at least 15-25nucleotides of the probe. Subsequences of higher GC % content can serveas nucleation sites for non-specific binding to non-target nucleotidesequences. In some embodiments, a candidate probe has a T_(m) for thetarget sequence of 65 to 95° C. In other embodiments, a candidatetarget-specific oligonucleotide probe has a T_(m) of 70 to 95° C., about75 to 95° C., about 80 to 95° C., or about 85° C. or greater.

In a method of the disclosure, the sequence of a target-specificoligonucleotide probe is designed or modified to decrease computed T_(m)to the target nucleotide sequence so that the modified target-specificoligonucleotide probe hybridizes to at least one non-target nucleotidesequence with a computed T_(m) the same as or lower than the computedT_(m) for hybridization to the target nucleotide sequence. In someembodiments, the computed T_(m) of the modified or designed probes withat least one non-target sequence is 65° C. or less.

In some embodiments, designing or modifying the target-specificoligonucleotide probe further comprises identifying the at least onenon-target nucleotide sequence that is homologous to the complement ofthe probe sequence, which is the same length as the region of the targetsequence that is complementary to the probe sequence. At least onenucleotide in the target-specific oligonucleotide probe is deleted orsubstituted in a region of the target-specific oligonucleotide probethat is complementary to or hybridizes to a sequence of the non-targetsequence that has the highest homology to the target nucleotide sequenceto form a first modified probe. In some embodiments, the sequence ofhighest homology is a region or subsequence of the non-target sequence.In some embodiments, the non-target sequence is longer than the targetsequence, the same length as the target sequence or shorter than thetarget sequence. The % homology between target and at least onenon-target nucleotide sequence can be determined using standard methods,such as BLAST. When the target and non-target sequence are the samelength, the region of highest homology between them can include about 15nucleotides up to the full-length sequence.

In some embodiments, a subsequence is at least about 15 nucleotides,about 20 nucleotides, or about 25 nucleotides or greater. In someembodiments, the subsequence can have a % GC content of at least 40% toabout 100%, 50% to 100%, 60 to 100%, 70 to 100%, 80 to 100%, or 90 to100%. In some embodiments, the at least one nucleotide that is deletedor substituted is located in the middle of the subsequence or thenon-target sequence. In other embodiments, the at least one nucleotideis substituted to eliminate a GC pair. In some embodiments, that atleast one nucleotide to be substituted or deleted in the targetnucleotide is complementary to both the target nucleotide sequence andthe at least one non-target sequence. The nucleotide can be substitutedor deleted. More than one nucleotide in the target-specificoligonucleotide probe can be substituted or deleted until the desirednumber of non-target sequences have a computed T_(m) lower than thecomputed T_(m) for the target sequence.

In some embodiments of a method, the target-specific oligonucleotide canbe designed or modified to decrease the computed T_(m) to at least onenon-target sequence to the same or lower than the computed T_(m) f tothe target sequence. Decreasing the T_(m) to non-target sequencesincreases the specific hybridization of the probe to the target sequenceby decreasing the number of non-target sequences that compete with thetarget sequence for hybridization to the probe.

In some embodiments, the first modified probe may be further modified bysubstituting or deleting at least one nucleotide in a region of thefirst modified probe that is complementary to region of a secondnon-target sequence that has the second most homology to the targetnucleotide sequences. For example, once a target nucleotide sequence isidentified, the sequence can be used to search for and identifyhomologous sequences that form a plurality of target specific candidateprobe sequences that are complementary to the target and non-targetsequences. The plurality of homologous sequences include both target andnon-target sequences. The plurality of homologous sequences comprisesequences that have at least 50%, 60%, 70%, 80%, 90%, 95% or 100%sequence identity to the target nucleotide sequence. In someembodiments, the region of greatest homology between a target and anon-target sequence is at least 15 nucleotides to 200 nucleotides, 20nucleotides to 100 nucleotides, or 25 to 50 nucleotides. Such searchingfor sequence identity can be conducted using methods and databases knownin the art. From the plurality of candidate probe sequences, candidateprobes can be designed or modified to have optimal hybridization to thetarget sequence while minimizing hybridization to non-target sequencesthat might be in the sample. In some embodiments, the probe is designedor modified so at least one nucleotide in a region of the probe that iscomplementary to that of a region of high homology between at least oneor two non-target sequences and the target sequence is substituted ordeleted.

In some embodiments, a method further comprises identifying a secondnon-target sequence that has the second most homology to the targetnucleotide sequence and modifying the first modified target-specificoligonucleotide probe by substituting or deleting at least onenucleotide in the probe sequence that is complementary to a region ofthe second non-target sequence that has the second most homology to thetarget nucleotide sequence to form a second modified probe. The secondmodified probe has a computed T_(m) to the second non-target nucleotidesequence that is the same or lower than the computed T_(m) of the secondmodified probe to the target nucleotide sequence.

Multiple modifications can be made in accord with the method describedabove until the desired number of non-target sequences no longerhybridize to the target-specific oligonucleotide under conditions of theassay or have a lower computed T_(m) to the designed or modified probe.Typically, that means that the computed T_(m) of the target-specificoligonucleotide to at least one of the non-target nucleotide sequence isless than 65° C. In some embodiments, the computed T_(m) of the targetspecific oligonucleotide probe to the target nucleotide sequence is atleast 65° C. or greater, 65 to 95° C., 70 to 95° C., about 75 to 95° C.,80 to 95° C., or 85° C. or greater. In some embodiments of the methoddescribed above, modifications are made until the computed T_(m) of thetarget-specific oligonucleotides probe to at least one non-targetnucleotide sequence decreases at least 1° C., 2° C., 3° C., 4° C. or 5°C. as compared to the computed T_(m) to the target sequence. In someembodiments, the computed T_(m) to at least one non-target sequence isdecreased at least 1 to about 25° C., 1 to about 20° C., 1 about 15° C.,1 to about 10° C., or 1 to about 5° C. as compared to that of thecomputed T_(m) to the target nucleotide sequence.

In some embodiments, a method further comprises identifying at least onehomologous nucleotide sequence to the first modified target-specificoligonucleotide probe and identifying a second non-target sequence thathas the most homology to the complement of the first modifiedtarget-specific oligonucleotide probe. The method further comprisesmodifying the first modified probe by substituting or deleting at leastone nucleotide in the sequence of the first modified probe that iscomplementary to a region of the second non-target sequence that has themost homology to the complement of the first modified probe to form asecond modified probe having a computed T_(m) for hybridization to thesecond non-target sequence that is the same or lower than that of theT_(m) to the target nucleotide sequence. Multiple modifications to thetarget-specific oligonucleotide probe can be made until the desiredT_(m) to at least one non-target sequence is obtained as describedabove.

Arrays

The present description also provides nucleic acid microarrays producedusing the subject methods, as described herein. The subject arraysinclude at least two distinct nucleic acids that differ by monomericsequence immobilized on, e.g., covalently to, different and knownlocations on the substrate surface. In certain embodiments, eachdistinct nucleic acid sequence of the array is typically present as acomposition of multiple copies of the polymer on the substrate surface,e.g., as a spot on the surface of the substrate. The number of distinctnucleic acid sequences, and hence spots or similar structures, presenton the array may vary, but is generally at least 2, usually at least 5and more usually at least 10, where the number of different spots on thearray may be as a high as 50, 100, 500, 1000, 10,000 or higher,depending on the intended use of the array. The spots of distinctpolymers present on the array surface are generally present as apattern, where the pattern may be in the form of organized rows andcolumns of spots, e.g., a grid of spots, across the substrate surface, aseries of curvilinear rows across the substrate surface, e.g., a seriesof concentric circles or semi-circles of spots, and the like. Thedensity of spots present on the array surface may vary, but willgenerally be at least about 10 and usually at least about 100 spots/cm²,where the density may be as high as 10⁶ or higher, but will generallynot exceed about 10⁵ spots/cm². In other embodiments, the polymericsequences are not arranged in the form of distinct spots, but may bepositioned on the surface such that there is substantially no spaceseparating one polymer sequence/feature from another. An exemplary arrayis described in U.S. Patent Publication No. 20050095596, which isincorporated herein by reference.

Arrays can be fabricated using drop deposition from pulsejets of eitherpolynucleotide precursor units (such as monomers) in the case of in situfabrication, or the previously obtained polynucleotide. Such methods aredescribed in detail in, for example, the previously cited referencesincluding U.S. Pat. Nos. 6,242,266, 6,232,072, 6,180,351, 6,171,797,6,323,043, U.S. patent application Ser. No. 09/302,898 filed Apr. 30,1999 by Caren et al., and the references cited therein. These referencesare incorporated herein by reference. Other drop deposition methods canbe used for fabrication, as previously described herein.

A feature of the subject arrays is that they include one or more,usually a plurality of, oligonucleotide probes. The oligonucleotideprobes selected according to the subject methods are suitable for use ina plurality of different gene expression or genomic microarrayapplications.

In using an array, the array will typically be exposed to a sample (forexample, a fluorescently labeled analyte, such as a sample containinggenomic DNA) and the array then read. Reading of the array may beaccomplished by illuminating the array and reading the location andintensity of resulting fluorescence at each feature of the array todetect any binding complexes on the surface of the array. For example, ascanner may be used for this purpose that is similar to the AGILENTMICROARRAY SCANNER available from Agilent Technologies, Palo Alto,Calif. Other suitable apparatus and methods are described in U.S. patentapplications: Ser. No. 09/846,125 “Reading Multi-Featured Arrays” byDorsel et al.; and Ser. No. 09/430,214 “Interrogating Multi-FeaturedArrays” by Dorsel et al. As previously mentioned, these references areincorporated herein by reference. However, arrays may be read by anyother method or apparatus than the foregoing, with other reading methodsincluding other optical techniques (for example, detectingchemiluminescent or electroluminescent labels) or electrical techniques(where each feature is provided with an electrode to detecthybridization at that feature in a manner disclosed in U.S. Pat. No.6,221,583 and elsewhere). Results from the reading may be raw results(such as fluorescence intensity readings for each feature in one or morecolor channels) or may be processed results such as obtained byrejecting a reading for a feature which is below a predeterminedthreshold and/or forming conclusions based on the pattern read from thearray (such as whether or not a particular target sequence may have beenpresent in the sample or an organism from which a sample was obtainedexhibits a particular condition).

The results of the reading (processed or not) may be forwarded (such asby communication) to a remote location if desired, and received therefor further use (such as further processing). By “remote location” ismeant a location other than the location at which the array is presentand hybridization occur. For example, a remote location could be anotherlocation (e.g. office, lab, etc.) in the same city, another location ina different city, another location in a different state, anotherlocation in a different country, etc. As such, when one item isindicated as being “remote” from another, what is meant is that the twoitems are at least in different buildings, and may be at least one mile,ten miles, or at least one hundred miles apart. “Communicating”information means transmitting the data representing that information aselectrical signals over a suitable communication channel (for example, aprivate or public network). “Forwarding” an item refers to any means ofgetting that item from one location to the next, whether by physicallytransporting that item or otherwise (where that is possible) andincludes, at least in the case of data, physically transporting a mediumcarrying the data or communicating the data. The data may be transmittedto the remote location for further evaluation and/or use. Any convenienttelecommunications means may be employed for transmitting the data,e.g., facsimile, modem, internet, etc.

Designing a microarray involves determining the amount of “real estate”(number of probes) that is available for the final array. The arraydesigner also determines the amount of probes or “real estate” to usefor specified regulatory regions, intergenic regions as well the amountof probes necessary to adequately cover introns and exons of thechromosomes of interest. Initially, a designer will generate 20 to 40million candidate probes and need to filter the probes for certain probeproperties or parameters to obtain a final array with approximately250,000 probes. Intermediate arrays are manufactured in some embodimentsof the methods of the invention, which have a redundancy of 3 or 4 foldover the number of probes selected for the final array, theseintermediate arrays are utilized to screen candidate probes for certainprobe properties by direct or indirect experimentation.

Standard hybridization techniques (using high stringency hybridizationconditions) are used to probe subject array. Suitable methods aredescribed in references describing, for example, CGH techniques(Kallioniemi et al., Science 258:818-821 (1992) and WO 93/18186).Several guides to general techniques are available, e.g., Tijssen,Hybridization with Nucleic Acid Probes, Parts I and II (Elsevier,Amsterdam 1993). For a descriptions of techniques suitable for in situhybridizations see, Gall et al. Meth. Enzymol. 21:470-480 (1981) andAngerer et al. in Genetic Engineering. Principles and Methods (Setlowand Hollander, eds.), vol. 7, pp. 43-65 (Plenum Press, New York 1985).See also U.S. Pat, Nos. 6,335,167; 6,197,501; 5,830,645; and 5,665,549;the disclosures of which are incorporated herein by reference.

In embodiments, the present description provides methods for selectingoligonucleotide probes that are specific to a target nucleic acidsequence within a region of interest. The probes are selected by amethod that optimally discriminates between desired homologous targetsequences and competing undesired sequences within a genome,transcription, or other known complex background, for example.Discrimination between the target of interest and competing sequences isoptimal (i.e. most sensitive) when the hybridization conditions are suchthat the hybrid formed between the probe and the target sequence isstable, while hybrids between the probe and one or more competingsequences are more unstable. Such unstable sequences are melted off atthe temperature at which the hybridization is performed. A hybrid pairwith a melting temperature (T_(m)) less than the temperature of thestringent hybridization conditions of the assay (e.g. 65C) is consideredunstable, and hybrid pairs melting above the temperature of the assayare stable. In many cases, however, the undesired targets form hybridpairs with the probe which melt at too high a temperature (i.e. highT_(m)), and interfere with the assay. This occurs particularly inGC-rich (“hot”) regions of the genome, where in many cases no probes canbe found with the desired probe-target T_(m). One method of designingprobes for targets in hot regions is to destabilize the probe-targethybrid by truncation, deletion of nucleotides, or mismatchingnucleotides. It is important, however, to destabilize not only thedesired probe-target hybrid, but also hybrids with undesired competingtargets.

Systems

The methods described herein are carried out in part with the aid of acomputer-based system, driven by software specific to the methods. A“computer-based system” refers to the hardware, software, and datastorage used to analyze the information of the present disclosure.Typical hardware of the computer-based systems of the present disclosurecomprises a central processing unit (CPU), input, output, and datastorage. A skilled artisan can readily appreciate that any one of thecurrently available computer-based system are suitable for use in thepresent disclosure. The data storage means may comprise any manufacturecomprising a recording of the present information as described above, ora memory access means that can access such a manufacture. In certaininstances a computer-based system may include one or more wirelessdevices.

To “record” data, programming or other information on acomputer-readable medium refers to a process for storing information ona recordable storage medium, using any such methods as known in the art.Examples include magnetic media such as hard drives, tapes, disks, andthe like. Optical media can include CDs, DVDs, and the like. Anyconvenient data storage structure may be chosen, based on the means usedto access the stored information. A variety of data processor programsand the formats can be used for storage, e.g., word processing textfile, database format, etc.

A “processor” references any hardware and/or software combination thatwill perform the functions required of it. For example, any processorherein may be a programmable digital microprocessor such as available inthe form of an electronic controller, mainframe, server or personalcomputer (desktop or portable). Where the processor is programmable,suitable programming can be communicated from a remote location to theprocessor, or previously saved in a computer program product (such as aportable or fixed computer readable storage medium, whether magnetic,optical or solid state device based). For example, a magnetic medium oroptical disk may carry the programming, and can be read by a suitablereader communicating with each processor at its corresponding station.

In aspects, the methods described herein are performed usingcomputer-readable media containing programming stored thereonimplementing the subject methods. The computer-readable media may be,for example, in the form of a computer disk or CD, a floppy disk, amagnetic “hard card”, a server, or any other computer-readable mediacapable of containing data or the like, stored electronically,magnetically, optically or by other means. Accordingly, storedprogramming embodying steps for carrying out the subject methods may betransferred to a computer such as a personal computer (PC), (i.e.accessible by a researcher or the like), by physical transfer of a CD,floppy disk, or like medium, or may be transferred using a computernetwork, server, or any other interface connection, e.g., the Internet.

In an embodiment, the system described herein may include a singlecomputer or the like with a stored algorithm capable of evaluating probeperformance, as described herein, i.e. a computational analysis systemthat performs statistical regression analysis on a set of training data.In certain embodiments, the system is further characterized in that itprovides a user interface, where the user interface presents to a userthe option of selecting among one or more different, or multipledifferent inputs. For example, in the systems described herein, the userhas the option of selecting various predictive parameters, such ascomposition factors, thermodynamic factors, kinetic factors, andmathematical combinations of such factors, as well as analogousparameters for the intended genomic targets. Computational systems thatmay be readily modified to become systems of the subject inventioninclude those described in U.S. Pat. No. 6,251,588, the disclosure ofwhich is incorporated herein by reference.

The various embodiments described above are provided by way ofillustration only and should not be construed to limit the invention.Those skilled in the art will readily recognize various modificationsand changes that may be made to the present invention without followingthe example embodiments and applications illustrated and describedherein, and without departing from the true spirit and scope of thepresent invention without following the example embodiments andapplications illustrated and described herein, and without departingfrom the true spirit and scope of the present invention, which is setforth in the following claims.

1. A method for designing a target-specific oligonucleotide probecomprising: a) identifying a target-specific oligonucleotide probecomprising a sequence complementary to a target nucleotide sequence ofinterest and that has a computed T_(m) of about 65° C. or greater; andb) modifying the sequence of the identified target-specificoligonucleotide probe to decrease the computed T_(m) so that themodified target-specific oligonucleotide probe hybridizes to at leastone non-target nucleotide sequence with a computed T_(m) lower than thecomputed T_(m) of the hybridization of the modified target-specificoligonucleotide probe to the target nucleotide sequence.
 2. The methodof claim 1, wherein the target-specific oligonucleotide probe hybridizesto the target nucleotide sequence at a computed T_(m) at about 75° C. orgreater.
 3. The method of claim 1, wherein the target-specificoligonucleotide probe can hybridize to the target nucleotide sequenceand one or more non-target nucleotide sequences.
 4. The method of claim1, wherein designing the target-specific oligonucleotide probe furthercomprises: a) identifying the at least one non-target nucleotidesequence that is homologous to the complement of the probe sequence; andb) deleting or substituting at least one nucleotide in a region of thetarget specific oligonucleotide probe that is complementary to a regionof the non-target nucleotide sequence that has the highest homology tothe target nucleotide sequence to form a first modified target-specificoligonucleotide probe having a computed T_(m) to the at least onenon-target nucleotide sequence that is the same or lower than thecomputed T_(m) of first modified target-specific oligonucleotide probeto the target nucleotide sequence.
 5. The method of claim 4, furthercomprising: c) identifying a second non-target nucleotide sequence thathas the second most homology to the target nucleotide sequence; and d)modifying the first modified target-specific oligonucleotide probe bysubstituting or deleting at least one nucleotide in the sequence that iscomplementary to a region of the second non-target nucleotide sequencethat has the second most homology to target nucleotide sequence to forma second modified probe having a computed T_(m) to the second non-targetnucleotide sequence that is the same or lower than the computed T_(m) ofthe second modified probe to the target nucleotide sequence.
 6. A methodof claim 4, further comprising: a) identifying at least one homologousnucleotide sequence to the first modified target-specificoligonucleotide probe and identifying a second non-target nucleotidesequence that has the most homology to the complement of the firstmodified target-specific oligonucleotide probe; and b) modifying thefirst modified target-specific oligonucleotide probe by substituting ordeleting at least one nucleotide in the sequence of the first modifiedprobe that is complementary to a region of the second non-targetsequence that has the most homology to the complement of the firstmodified target specific oligonucleotide probe to form a second modifiedprobe having a computed T_(m) to the second non-target nucleotidesequence the same or lower than that of the computed T_(m) of the secondmodified probe to the target nucleotide sequence.
 7. The method of claim1, wherein the target-specific oligonucleotide probe is complementary tothe target nucleotide sequence and at least two other non-targetnucleotide sequences.
 8. The method of claim 1, wherein at least onenucleotide is deleted.
 9. The method of claim 1, wherein at least onenucleotide is substituted.
 10. The method of claim 1, wherein thetarget-specific oligonucleotide probe is at least 25 nucleotides long.11. The method of claim 7, wherein the target nucleotide sequence andthe at least two other non-target nucleotide sequences have at least 80%homology over at least 25 nucleotides.
 12. The method of claim 1,wherein steps a) and b) are repeated until a computed T_(m) of themodified target-specific oligonucleotide probe to at least one of thenon-target sequences has a decrease in computed T_(m) of at least 1° C.13. The method of claim 1, wherein the region of the non-target sequencethat has the most sequence homology to the target sequence has a % GCcontent of at least 40%.
 14. The method of claim 13, wherein at leastone nucleotide is substituted to in the target-specific oligonucleotideprobe to eliminate a GC pair.
 15. The method of claim 1, wherein atleast one nucleotide that is modified is located in the middle of theregion of target-specific oligonucleotide probe that is complementary tothe region of most homology between the target and non-target sequence.